Model-Based Hierarchical Clustering

نویسندگان

  • Shivakumar Vaithyanathan
  • Byron Dom
چکیده

We present an approach to model-based hi­ erarchical clustering by formulating an ob­ jective function based on a Bayesian anal­ ysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key compo­ nent of our model. Features can have either a unique distribution in every cluster or a com­ mon distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribu­ tion correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of docu­ ment clustering for which we use a multino­ mial likelihood function and Dirichlet priors. Our algorithm consists of a two-stage pro­ cess wherein we first perform a flat clustering followed by a modified hierarchical agglom­ erative merging process that includes deter­ mining the features that will have common distributions over the merged clusters. The regularization induced by using the marginal likelihood automatically determines the op­ timal model structure including number of clusters, the depth of the tree and the subset of features to be modeled as having a com­ mon distribution at each node. We present experimental results on both synthetic data and a real document collection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HIERARCHICAL DATA CLUSTERING MODEL FOR ANALYZING PASSENGERS’ TRIP IN HIGHWAYS

One of the most important issues in urban planning is developing sustainable public transportation. The basic condition for this purpose is analyzing current condition especially based on data. Data mining is a set of new techniques that are beyond statistical data analyzing. Clustering techniques is a subset of it that one of it’s techniques used for analyzing passengers’ trip. The result of...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Application of 3D-QSAR on a Series of Potent P38-MAP Kinase Inhibitors

One of the most applied methods in drug industry for development of new drugs is 3D-QSAR methodology. As p38-mitogen-activated protein kinase (p38-MAPK) plays a crucial role in regulating the production of such proinflammatory cytokines as tumor necrosis factor-α (TNF-α) and interleukin-1, emerging as an attractive target for new anti-inflammatory agents, we used a 3D-QSAR based method of Compa...

متن کامل

PRFM Model Developed for the Separation of Enterprise Customers Based on the Distribution Companies of Various Goods and Services

In this study, a new model of combining variables affecting the classification of customers is introduced which is based on a distribution system of goods and services. Given the problems that the RFM model has in various distribution systems, a new model for resolving these problems is presented. The core of this model is the older RFM. The new model that has been proposed as PRFM, consists of...

متن کامل

روش نوین خوشه‌بندی ترکیبی با استفاده از سیستم ایمنی مصنوعی و سلسله مراتبی

Artificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects. AIS algor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000